Corpus Analysis for Lexical Database Construction: A Case of Russian and Czech Wordnets
نویسندگان
چکیده
The paper deals with corpus-based methods applied to the particular tasks of lexical database construction. Different techniques of the corpus analysis are discussed and their applicability for the tasks is assessed. Corpus management system Manatee + Bonito developed at the Faculty of Informatics, Masaryk University in Brno, Czech Republic, is presented as a tool that enables to perform all discussed linguistic studies. We mainly focus on the methods of substitutions and extractions of lexico-syntactic patterns that present a kind of standard approaches to the creation of lexical databases. We also briefly mention the employment of word sketches – a new technique in lexicography aiming at speed up of corpus analysis work.
منابع مشابه
FrameBank: A Database of Russian Lexical Constructions
Russian FrameBank is a bank of annotated samples from the Russian National Corpus which documents the use of lexical constructions (e.g. argument constructions of verbs and nouns). FrameBank belongs to FrameNetoriented resources, but unlike Berkeley FrameNet it focuses more on the morphosyntactic and semantic features of individual lexemes rather than the generalized frames, following the theor...
متن کاملDevelopment of Russian lexical databases, corpora and supporting tools for speech products
The situation with regard to Russian language resources is fragmented and disorganized. For this reason, it is important to promote for Russian the development of its basic resources in one package that could be used for development of speech products. The paper presents a design of the Russian lexical databases, corpora and supporting tools (system for construction and support of lexical datab...
متن کاملBank of Russian Constructions and Valencies
The Bank of Russian Constructions and Valencies (Russian FrameBank) is an annotation project that takes as input samples from the Russian National Corpus (http://www.ruscorpora.ru). Since Russian verbs and predicates from other POS classes have their particular and not always predictable case pattern, these words and their argument structures are to be described as lexical constructions. The sl...
متن کاملThe Automatic Mapping of Princeton WordNet Lexical-Conceptual Relations onto the Brazilian Portuguese WordNet Database
Princeton WordNet (WN.Pr) lexical database has motivated efficient compilations of bulky relational lexicons since its inception in the 1980 ́s. The EuroWordNet project, the first multilingual initiative built upon WN.Pr, opened up ways of building individual wordnets, and interrelating them by means of the so-called Inter-Lingual-Index, an unstructured list of the WN.Pr synsets. Other important...
متن کاملAutomated WordNet Construction Using Word Embeddings
We present a fully unsupervised method for automated construction of WordNets based upon recent advances in distributional representations of sentences and word-senses combined with readily available machine translation tools. The approach requires very few linguistic resources and is thus extensible to multiple target languages. To evaluate our method we construct two 600-word test sets for wo...
متن کامل